Validating Machine 
Learning-Based Systems 
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Why Not Use a Traditional Driver Test? Nilo 
= Written test 
e Does ADS know traffic laws? 
e Does ADS know behavioral expectations? 
= Road test 
e Can ADS execute traffic laws? 
e Can ADS negotiate effectively with human drivers? 
e Does ADS exhibit good driver hygiene? Oe... 
e Can ADS resolve potentially ambiguous driving situations? 
= Being a 16 year old human 
e How do we measure ADS judgment maturity? 
e Does the ADS know when it doesn't know what to do? 
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Machine Learning Challenges Ushers 


= Inductive learning THIS IS YOUR MACHINE LEARNING SYSTEM? 


e Collect lots of training data PLEOF LNEAR ALGEBRA TEN head 


e Adjust learned model; iterate THE ANSLIERS ON THE OTHER SIDE. 


~,* tha WHAT IF THE ANSWERS ARE WRONG? ] 
e veciare success wnen testS pass JUST STIR THE PILE UNTIL 


= Fundamental challenges: sielailatiy iualiiaiaa 
e Assurance on novel inputs 
— What if it over-fitted to data? 
— Gaps in training data 
e Did it learn what you hoped? 
— Prone to “gaming’ the learning 
e What was actually learned? 
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Traditional Validation Vs. Machine Learning a, 





= Use traditional software et 
peration 
safety where you can Operations \ Verification (45, @nd 


Validation 















System 
srification 
Validation 


Project <equiremen 
Definition q and 


Architectu 


..BUT.. 











ration, 
Test, and 
Verification 


inve 






Project 
Test and 
Integration 


Detailed 
Design 


= Machine Learning 
(inductive training) 
> 


e No requirements Time 
- Training data is difficult to validate 
e No design insight 
— Generally inscrutable; prone to gaming and brittleness 
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Validation Via Brute Force Road Testing? ioe 

= If 200M miles/critical mishap... # WolframAlpha sea 

e Test 3x-10x longer than mishap — 
rate — 

=> Need 2 Billion miles of testing ae 
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= Thats ~50 round trips 
on every road in the world 
e With fewer than 10 critical mishaps 








= And what if the answer is: 


“a e yy ; ei sy 
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Safer, but expensive 
e Not scalable 
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Highly scalable; fidelity vs. cost tradeoff 
e Need to build highly detailed models 
e Challenge of matching real world data into simulation models 
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aa Scenarios must cover Operational NHTSA-inspired pre-crash scenarios 


We have selected 10 traffic scenarios from the NHTSA pre-crash typology to inject challenging driving situations into traffic patterns encountered 


Design Domain (ODD) | 
THE PEGASUS METHOD 


Traffic Scenario 01: Control loss without previous action 


¢ Definition: Ego-vehicle loses control due to bad conditions on the road and it 


must recover, coming back to its original lane. 


https://www.pegasusprojekt.de/en/pegasus-method 


PEGASUS Method for Assessment of Highly Automated Driving Function (HAD-F) Soe 


V1.5 Status 
14.02.2019 


PEGASUS 


Traffic Scenario 02: Longitudinal control after leading vehicle's brake 


* Definition: Leading vehicle decelerates suddenly due to an obstacle and ego- 
Assessment of Highly Automated Driving Function 


Database 


Processing 


vehicle must react, performing an emergency brake or an avoidance maneuver. 
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Space of Logical 
Test Cases 


Traffic Scenario 03: Obstacle avoidance without prior action 
: me ; | 
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Integration 
Pass Criteria 


¢ Definition: The ego-vehicle encounters an obstacle / unexpected entity on the 


road and must perform an emergency brake or an avoidance maneuver. 
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Traffic Scenario 04: Obstacle avoidance with prior action 


¢ Definition: While performing a maneuver, the ego-vehicle finds an obstacle / 


unexpected entity on the road and must perform an emergency brake or an 


avoidance maneuver 
Data / Content 
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https://carlachallenge.org/challenge/nhtsa/ 
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Simulation Components University 
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Simulation Validity Saat 





= Fidelity & qualification 
e Environment; road users 
e Perception as well as vehicle motion 
e Appropriate safety metrics 
e Tool & model qualification 





| New fyses with subleveling patios 
| chica dist coutiofflfows 6F Selec a 


= Experimental design 
e Coverage of ODD & high-risk edge cases 
e Matching simulated scenario to real-world scenario 
e Experimental design for validation of simulation itself 


CARLA https://youtu.be/2c-KIQ8SFcc 


“All models are wrong, but some are useful. ’- George Box 
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What Does It Mean for a Test To Pass? Uheiversity 
= Traditional test paradigm: 
e You think design is right 
e Test validates engineering done properly 
— Test traces to requirements/design 





https://goo.gl/cFCknY 


= Inductive training test paradigm: 
e You think system was trained properly 
e Test determines whether training worked 
— Weak traceability to test set, if any 
— Hope to detect training data gaps, overfitting 
e BUT: nondeterministic, opaque “design” 
— Robust test plan is essential 
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Changing 
Validation Approaches 





“» Machine Learning (ML) breaks the “V” 
“» Simulation validity (including models & test plan) 


“> Are you simulating perception (the hardest part)? 
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